Skip to content

Conversation

@brownbaerchen
Copy link
Collaborator

@brownbaerchen brownbaerchen commented Jan 7, 2026

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • NEW unit tests: MPS tested (1 MPI process, 1 GPU)
    • benchmarks: created for new functionality
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

Python uses __repr__ and __str__ to give textual information about objects. The intention is that __repr__ gives output for developers and __str__ gives output for users. In heat arrays, this means __repr__ tells you, for each MPI rank, what kind of data is contained in the array, while __str__ tells you the actual data.

This is all well and good but the problem is when people are using heat interactively, in which case they get the __repr__ if they just print the array, but if they are users, they want to know the data and not the local shape.
For example:

python
import heat as ht
a = ht.arange(4)
a  # prints a.__repr__()
f"{a}"  # prints a.__str__()

when doing something like this, you tend to want to know the contents of the array in both cases.

In this PR, information about the contents of the array is appended. Before you got

$ mpirun -np 2 python -c "import heat as ht; a = ht.arange(3 * 4, split=0).reshape((3, 2, 2)); print(a.__repr__())"
<DNDarray(MPI-rank: 0, Shape: (3, 2, 2), Split: 0, Local Shape: (2, 2, 2), Device: cpu:0, Dtype: int32)>
<DNDarray(MPI-rank: 1, Shape: (3, 2, 2), Split: 0, Local Shape: (1, 2, 2), Device: cpu:0, Dtype: int32)>

with the changes introduced here, you get

$ mpirun -np 2 python -c "import heat as ht; a = ht.arange(3 * 4, split=0).reshape((3, 2, 2)); print(a.__repr__())" 
DNDarray(MPI-rank: 1, Shape: (3, 2, 2), Split: 0, Local Shape: (1, 2, 2), Device: cpu:0, Dtype: int32
         [[[ 8,  9],
           [10, 11]]])
DNDarray(MPI-rank: 0, Shape: (3, 2, 2), Split: 0, Local Shape: (2, 2, 2), Device: cpu:0, Dtype: int32
         [[[0, 1],
           [2, 3]],

          [[4, 5],
           [6, 7]]])

Compare this to the __str__ output

$ mpirun -np 2 python -c "import heat as ht; a = ht.arange(3 * 4, split=0).reshape((3, 2, 2)); print(a)"           
DNDarray([[[ 0,  1],
           [ 2,  3]],

          [[ 4,  5],
           [ 6,  7]],

          [[ 8,  9],
           [10, 11]]], dtype=ht.int32, device=cpu:0, split=0)

Let me know what you think about this.

Issue/s resolved: #2018

Changes proposed:

  • Print both debug information and content information in interactive sessions
  • Test for correct output in __repr__ with multiple processes

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Thank you for the PR!

@brownbaerchen
Copy link
Collaborator Author

I am kind of lost why this fails. On Jureca I get

(heat_venv) [me@jrlogin02 heat]$ python
Python 3.12.3 (main, Jul 26 2024, 17:40:49) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import heat as ht
/path/heat/heat/core/_config.py:108: UserWarning: Heat has CUDA GPU-support (PyTorch version 2.5.1+cu124 and `torch.cuda.is_available() = True`), but CUDA-awareness of MPI could not be detected. This may lead to performance degradation as direct MPI-communication between GPUs is not possible.
  warnings.warn(
>>> a = ht.arange(4, device='gpu')
>>> a
DNDarray([0, 1, 2, 3], dtype=ht.int32, device=gpu:0, split=None)
>>> a.__repr__(force_debug_output=True)
'<DNDarray(MPI-rank: 0, Shape: (4,), Split: None, Local Shape: (4,), Device: gpu:0, Dtype: int32)>'
>>> a.__str__() == a.__repr__()
True

Same result with cpu array and the tests pass both in serial as well as in parallel.

So it seems to work also on GPUs. Anyone any ideas? I will not spend time on this until we decide that we even want this behavior.

@brownbaerchen brownbaerchen marked this pull request as draft January 12, 2026 10:27
@brownbaerchen
Copy link
Collaborator Author

We decided that we will try for a compromise. Rather than either printing the content or only the debug information about an array, in __repr__, we want both debug information and some content of the array. In particular, we need to watch out that no communication happens in __repr__ to make sure we don't add deadlocks.

@brownbaerchen brownbaerchen changed the title Printing debug information in __repr__ only when a debugger has set a trace Print contents of array in __repr__ Jan 13, 2026
@brownbaerchen brownbaerchen marked this pull request as ready for review January 13, 2026 14:09
@github-actions
Copy link
Contributor

Thank you for the PR!

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.68%. Comparing base (95cb222) to head (7d97755).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2091   +/-   ##
=======================================
  Coverage   91.68%   91.68%           
=======================================
  Files          89       89           
  Lines       13945    13946    +1     
=======================================
+ Hits        12786    12787    +1     
  Misses       1159     1159           
Flag Coverage Δ
unit 91.68% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ClaudiaComito ClaudiaComito added this to the 1.7.1 milestone Jan 14, 2026
@ClaudiaComito ClaudiaComito moved this from Todo to In Progress in Roadmap Jan 14, 2026
mtar
mtar previously approved these changes Jan 14, 2026
Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My minor suggestion is to add the tag 'data:' before showing the local data.

@github-project-automation github-project-automation bot moved this from In Progress to Merge queue in Roadmap Jan 14, 2026
@brownbaerchen
Copy link
Collaborator Author

My minor suggestion is to add the tag 'data:' before showing the local data.

I added the tag, such that the example from the description now is:

$ mpirun -np 2 python -c "import heat as ht; a = ht.arange(3 * 4, split=0).reshape((3, 2, 2)); print(a.__repr__())"
DNDarray(MPI-rank: 1, Shape: (3, 2, 2), Split: 0, Local Shape: (1, 2, 2), Device: cpu:0, Dtype: int32, Data:
         [[[ 8,  9],
           [10, 11]]])
DNDarray(MPI-rank: 0, Shape: (3, 2, 2), Split: 0, Local Shape: (2, 2, 2), Device: cpu:0, Dtype: int32, Data:
         [[[0, 1],
           [2, 3]],

          [[4, 5],
           [6, 7]]])

Putting it in the line where the data begins somehow looked odd to me.

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@brownbaerchen brownbaerchen requested a review from mtar January 15, 2026 12:22
Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine 👍

@brownbaerchen brownbaerchen merged commit 25da82d into main Jan 15, 2026
56 checks passed
@github-project-automation github-project-automation bot moved this from Merge queue to Done in Roadmap Jan 15, 2026
@brownbaerchen brownbaerchen deleted the bugs/2018-_Bug___repr___vs___str___behaviour branch January 15, 2026 13:02
@github-actions
Copy link
Contributor

Successfully created backport PR for stable:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: __repr__ vs __str__ behaviour.

4 participants